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Abstract 

A large body studies into individual differences in second language learning 
has shown that success in second language learning is strongly affected by a 
set of relevant learner characteristics ranging from the age of onset to moti¬ 
vation, aptitude, and personality. M ost studies have concentrated on a limited 
number of learner characteristics and have argued for the relative importance 
of some of these factors. Clearly, some learners are more successful than oth¬ 
ers, and it is tempting to try to find the factor or combination of factors that 
can crack the code to success. However, isolating one or several global indi¬ 
vidual characteristics can only give a partial explanation of success in second 
language learning. The limitation of this approach is that it only reflects on 
rather general personality characteristics of learners at one point in time, 
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while both language development and the factors affecting it are instances of 
complex dynamic processes that develop over time. Factors that have been 
labelled as ''individual differences" as well as the development of proficiency 
are characterized by nonlinear relationships in the time domain, due to which 
the rate of success cannot be simply deduced from a combination of factors. 
Moreover, in complex dynamic systems theory (CDST) literature it has been 
argued that a generalization about the interaction of variables across individ¬ 
uals is not warranted when we acknowledge that language development is 
essentially an individual process (M olenaar, 2015). In this paper, the viability 
of these generalizations is investigated by exploring the L2 development over 
time for two identical twins in Taiwan who can be expected to be highly similar 
in all respects, from their environment to their level of English proficiency, to 
their exposure to English, and to their individual differences. In spite of the 
striking similarities between these learners, the development of their L2 Eng¬ 
lish overtime was very different. Developmental patterns for spoken and writ¬ 
ten language even showed opposite tendencies. These observations under¬ 
line the individual nature of the process of second language development. 

Keywords: individual differences; second language development; complex dy¬ 
namic systems; variability process study 


L Factors to precict 12 success in gtxpstucEes 

If there is one issue that the majority of researchers in second language acqui¬ 
sition agree on, it is the observation that individual differences (IDs) between 
learners are statistically associated with the success in second language learn¬ 
ing. Differences between individuals like motivation, aptitude, and age have tra¬ 
ditionally been treated as influential factors affecting success in second lan¬ 
guage learning. Within the long standing tradition of ID research in psychology, 
many studies have focused on understanding the cause of the differences be¬ 
tween individuals in relation to learning achievement. The attention in the liter¬ 
ature to IDs in second language development, most notably to aptitude and mo¬ 
tivation, is still increasing. The focus on the effect of motivation alone has shown 
a surge in research output of the past ten years, from 33 to 138 publications, 
more than half of which appeared in peer review top journals in the field (Boo, 
Dornyei, & Ryan, 2015). With ever more sophisticated statistical analyses, stud¬ 
ies have attempted to identify the IDs that most accurately predict the success 
in learning. For instance, Gardner, Trembley, and Masgoret (1997) used structural 
equation modelling to identify the relative importance of a large number of IDs 
and explored the causal relationship between them. Using a causal modelling 
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approach in which they simultaneously evaluate the relationships among a large 
number of IDs, they show that motivation most strongly predicts achievement 
in the L2 (.48), followed by aptitude (.47), while confidence is most strongly 
loaded by achievement (.60). These and other studies focusing on the relative 
importance of IDs seem to agree that aptitude (the "talent for language learn¬ 
ing") is one of the most promising factors (prediction of success is .50), followed 
by motivation (.40). Another influential factor turns out to be the age of onset 
(.50). The relatively high correlations between success in the L2 (either based 
on grades or on self-assessment) and motivation is reliable and consistent, as 
was shown by M asgoret and Garner (2003), who carried out a meta-study of 75 
independent samples. M ultiple regression analyses show that the combination 
of aptitude and motivation, which show hardly any overlap between them¬ 
selves, leads to even better prediction of success (.60). The statistical analyses 
have improved from simple correlations to more advanced types of analysis. For 
instance, using hierarchical regression analyses to determine the effect of musi¬ 
cal ability on L2 proficiency, Sieve and Miyake (2006) find that musical ability 
contributes to receptive L2 phonology (.37), while age of arrival is the most im¬ 
portant factor to predict lexical knowledge (-.42). A fully up-to-date approach to 
investigate the relative importance of IDs is the use of mixed effect modelling 
techniques in which IDs are successfully "neutralized" by including the individual 
as a random factor in the analysis (Kozaki & Ross, 2011; Tremblay, Derwing, Lib- 
ben, & Westbury, 2011; see also Cunnings, 2012, and Linck & Cunnings, 2015). 

In spite of all these promising developments, however, there have also been 
critical views on the relevance of IDs. For instance, Dornyei (2009) refers to IDs as 
a "myth" and argues that they do not exist as identifiable factors that can contrib¬ 
ute to success in second language learning. He disputes the major assumption 
that learner internal variables are independent of the environment. He argues 
that IDs are not distinctly definable, not stable, and not monolithic; in addition, 
they are strongly dependent on time and context. Dornyei (2010) also finds that 
the distinction between motivation and aptitude is untenable, as illustrated by 
the concept of "flow," a balanced mixture of motivation and aptitude, which 
demonstrates that the distinction between the two is artificial. Arguments about 
the non-monolithic nature of IDs have been worked out in more detail in Dornyei 
and Ryan (2015), who convincingly show that the classical approach to IDs may 
be intuitively appealing but does not provide a realistic representation of how 
second language development varies as a function of time and context. 

Several studies have shown that IDs are far from stable over time. Jiang 
and Dewaele (2015) investigated several aspects of motivation at three mo¬ 
ments in time. Their analyses revealed a complex picture of the ideal and ought- 
to L2 selves, which changed over time and were affected by various motivational 
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variables. Significant changes occurred in the ideal-L2 self and the ought-to L2 
self and their relationship with other motivational factors over the year. "The 
nonlinear changes in Ideal/Ought-to L2 self," they show, "were consistent with 
the basic dynamic features of self-concept" (J iang & Dewaele, 2015, p. 349). This 
study clearly showed that several ID variables interact and change overtime (see 
Figure 1). The variable nature of IDs over time was also found in a study by Wan- 
ninge, Dornyei, and de Bot (2014) on motivational dynamics during a Spanish 
lesson. Even at a short timescale, which was the focus of their study (5-minute 
steps), motivation was highly variable and showed unique patterns of variability 
for different individuals in their study. We can conclude from these studies that 
IDs change over time at different timescales. One proposal is to redefine IDs in 
a more dynamic framework, as is done by Dornyei (2009,2010). From a complex 
dynamic systems theory (CDST) perspective, Dornyei argues, higher-order ID 
variables can be seen as attractors that act as stabilizing forces in the develop¬ 
mental process. Fie considers ID variables in the framework of cognition, moti¬ 
vation and affect, and introduces factors like "possible selves" to represent indi¬ 
vidually motivated change over time. Flowever, he also argues that there can 
never be a direct causal effect between these attractor states and L2 learning. 



.FI 1 

.F35 

-F72 

- • F82 

-F85 


Figure 1 Variability in ought-to-self scores (1-5) for five individual participants 
(F11-F85) over a period of 12 months divided over three measurements (from 
Jiang & Dewaele, 2015) 

2. Gnxpstucfies versus incEvidual case stufes 

In addition to the fact that IDs are not stable and delineable and may change as 
a function of time, there is another more serious statistical limitation to many 
current ID approaches. Most, if not all IDs studies have focused on inter-individual 
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variation and use Gaussian statistics to make conclusions about IDs based on 
group measures. However, such a generalization does not take into account the 
individual's process of development over time. This point is clearly explained by 
M olenaar (2015), who refers to Catell's (1952) data box. In most research, essen¬ 
tially two dimensions are investigated (see Figure 2). The first dimension investi¬ 
gates how different variables (say motivation, aptitude, and language achieve¬ 
ment) are statically related by generalizing over observations across individuals 
(inter-individual variation, which we will refer to as variation). In the second di¬ 
mension, the relationships of variables can be described in one individual case as 
it emerges over time (intra-individual variation, which we will refer to as variabil¬ 
ity). M olenaar (2015) shows that the combination of heterogeneity across sub¬ 
jects and heterogeneity in time violates assumptions for generalization. 

Although innovations in statistical techniques are developing, most statis¬ 
tics currently used do not allow for generalizations across variables for different 
individuals in the time domain, and the analysis used is essentially a choice be¬ 
tween either of the two dimensions. M olenaar argues that there is no relation 
between results obtained in statistical analyses on group data at one moment in 
time and an individual's development as it emerges over time, so data on the in¬ 
teraction of variables based on groups of individuals at one point in time cannot 
say anything about individual development over time and vice versa. Since most 
IDs have been demonstrated to be unstable and change over time, the analysis of 
variation will need to be complemented by analyses of variability over time. 



Sngle Occasion lAge) 


FfcjLre2Catell's cube illustrating the dimensions of data analysis (M olenaar, 2015) 
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The analysis of variability is important if we are genuinely interested in 
how language changes over time in relation to IDs, In these cases, Molenaar 
(2015) argues in favour of subject specific data analysis for person-oriented pro¬ 
cesses. Since language development can clearly be classified as an individual 
person-oriented process, the combination of interacting variables and changing 
development must be seen as separate dimensions. One line of research is to 
focus on interacting variables for groups of learners, ignoring the time dimen¬ 
sion. Virtually all studies on IDs in L2 learning have followed this line. Therefore, 
it is important to complement ID group studies with variability studies in which 
individual "differences" are excluded, but the focus is on the development over 
time of individual learners. 

3. Deg^esof variability to predict L2 success in indviclual learners 

Thelen and Smith (1994) argue that there is not one direct cause for new behav¬ 
ior, but that it emerges from the confluence of different subsystems, and varia¬ 
bility will occur in some of these subsystems because it is necessary to drive the 
developmental process as it allows the learner to explore and select. Because 
variability reflects the manifestation of the system's adaptability to the environ¬ 
ment and signals the process of self-organization after perturbations of the sys¬ 
tem, it is a sign of development. From a more formal perspective, systems have 
to become "unstable" before they can change (Hosenfeld, Van der M aas, & Van 
den Boom, 1997). For instance, high intra-individual variability impliesthat qual¬ 
itative developmental changes may be taking place (Lee & Karmiloff-Smith, 
2002). The cause and effect relationship between variability and development 
is considered to be reciprocal. On the one hand, variability permits flexible and 
adaptive behavior and is a prerequisite to development. (Just as in evolution 
theory, there is no selection of new forms if there is no variation.) On the other 
hand, free exploration of performance generates variability. Trying out new 
tasks leads to instability of the system and consequently to an increase in varia¬ 
bility. Variability is especially large during periods of rapid development because 
at that time the learner explores and tries out new strategies or modes of be¬ 
havior that are not always successful (Thelen & Smith, 1994). Therefore, the 
claim is that stability and variability are indispensable aspects of human devel¬ 
opment that should be part of any analysis. 

When we apply CDST insights to language development, we may assume 
the following: A first or second language is a complex dynamic system consisting 
of many subsystems such asthe sound system, the grammar system, the lexical 
system, and so on, all of which are interrelated and may influence each other. 
M any internal states such as language aptitude, motivation, attitude, personality 
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traits, and other "individual differences" have effect on the developmental tra¬ 
jectory. The developmental path may further be affected by external states or 
events such as the general context in which a language is learned, a particular 
teacher, an illness, and other conditions at any given moment. All these dynam¬ 
ically interrelated factors may cause any part of the learner's language system 
to fluctuate from one moment to the next. These fluctuations are normal for 
any (sub)system that has stabilized to any extent. However, strong fluctuations 
may indicate that a (sub)system is changing. 

Learning is not linear: In both first and second language development, 
some subsystems may take off slowly at first, then all of a sudden jump off, and 
level off at the end. Other subsystems may develop in completely different ways. 
However, the interaction of developing subsystems will be manifested in a great 
deal of variability in the learner's language. Because learners may have different 
starting points and learning contexts, variation among learners is also bound to 
exist. A great number of studies (cf. Bulte, 2013; Byrnes, 2009; Caspi, 2010; 
Larsen-Freeman, 2006; Murakami, 2013; Tilma, 2014; van Geert, 2008; 
Verspoor, Lowie, & van Dijk, 2008; Vyatkina, 2012) now have traced individual 
learners and shown that learners each have their own unique developmental 
trajectory, showing high degrees of variability and changes in variability pat¬ 
terns. Without explicitly mentioning it, these studies have concentrated on one 
individual slice of Catell's cube, showing how variables interact in the time di¬ 
mension of that individual. In these longitudinal, process-based studies with 
dense data, it has been found that different degrees of variability may indicate 
different degrees of development. For instance, high initial within-subject vari¬ 
ability tends to be positively related to subsequent learning, and such learning 
reflects the addition of new strategies, greater reliance on relatively advanced 
strategies already being used, improved choices among strategies, and new 
ways to execute existing strategies (Verspoor, Lowie, & Van Dijk, 2008). For ex¬ 
ample, on a number of conservation and sort-recall tasks, children who used 
more and different strategies on the pre-test used more advanced strategies on 
subsequent tasks (Coyle & Bjorklund, 1997; Siegler as cited in Siegler, 2006). 

These studies have concentrated on single learners; no study so far has 
compared two learners to explore to what extent the degree of variability in the 
development over time may be related to interacting variables. According to 
Catell's separation of dimensions as explained by M olenaar (2015), interacting 
variables in the time dimension are not likely to be identical for different indi¬ 
viduals. If the possibility of similarity between learners in the time dimension 
are investigated, it will have to bedone with very similar learners to minimize the 
myriad of factors that may affect the degree of variability, such as differences in 
initial conditions, differences in personality and other "individual differences," 
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and differences in external factors such asthe kind and amount of exposure. There¬ 
fore, we focus on the developmental pattern of identical twins that have grown up 
in an identical environment and have been exposed to identical L2 input. 

4. A case study of identical twins 

As in Chan, Verspoor, and Vahtrick (2015), we compare identical twins, who 
were very similar in many respects. They live in the same home and have at¬ 
tended the same school in the same class. M ost traditional twin studies investi¬ 
gate the effect of genetic factors by comparing monozygotic (MZ, or identical) 
twin pairs with dizygotic (DZ or fraternal) twins (Segal, 2010; Stromswold, 
2006). The current study does not focus on the genetic effect and does not com¬ 
pare the two types of twins but examines only one pair of M Ztwins. The major¬ 
ity of twin studies focusing on linguistics have found identical twins to perform 
more similarly than fraternal twins, which validates the identical nature of their 
genetic makeup in the current study (Stromwold, 2006). In stating that the par¬ 
ticipants are identical twins, we are not invoking the much-maligned equal en¬ 
vironments assumption (Plomin, Defries, McClern, & McGuffin, 2008), which ar¬ 
gues that M Zand DZ twins share equal environments, so any significantly closer 
developmental patterns found in M Ztwins must be due to genetics. Instead, we 
merely assume that twins who share 100% of their genes and who have been 
raised in an identical environment are more likely than any other pair of learners 
to exhibit similar developmental patterns (Hayiou-Thomas, 2008). Chan et al. 
(2015) investigated their developmental stages over several syntactic complexity 
measures in both their speaking and their writing to see whether the sequences 
of observed developments in writing and speaking occur simultaneously or in a 
different order, and whether the twins develop in a similar manner. The finding 
was that abilities tapped by different measures developed in the spoken language 
before the written language and that the stages in the twins were not the same. 

In the current paper, we will re-examine the data to answer our main re¬ 
search questions: 

1. Can the degrees of variability in individuals be associated with L2 success 
in individual learners? 

2. Can similar interactions of variables be detected in the developmental 
patterns of two highly similar individuals? 

To be able to answer these questions, we will first investigate the development 
of two variables in lexical and syntactical development in both written and spoken 
free production tasks. The specific sub-questions pertain to each of the four variables: 
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1. Is there a difference between the average scores of the twins? 

2. Is there a significant increase or decrease in each individual time series 
of scores? 

3. Is there a difference in the amount of variability between the twins for 
all variables? 

4. Is there a changing slope in the range of variability in the time series of 
each of the learners? 

5. Method 

5.L Rartidpants 

Gloria and Grace (not their real names) are two female identical twins, aged 15 
at the time of the study. For ten years, they attended school in Taiwan in the 
same English class with the same English teacher, where English classes were 
taught in Chinese with a focus on grammar. In other words, until the current 
study began, they had mainly received only written input in English. At the be¬ 
ginning of the study, they had a very similar English proficiency level (see Table 
1) as measured bythe General English Proficiency Test (GEPT; Wu, 2012). 

Table3 English proficiency scores (GEPT) for the twins 



Grace 

Gloria 

Listening (120) 

112 

112 

Speaking (100) 

80 

80 

Reading (120) 

108 

105 

Writing (100) 

88 

82 


As shown by an informal personality test, the big five test, 1 carried out at 
the onset of the experiment, the two girls also had similar personalities; they 
were rather strongly sociable, friendly, and talkative. The individual scores for 
the participants are represented in Table 2. 

Table4Big five personality test for the twins (percentiles) 

Trait _ Description _ Gloria Grace 

Openness to High scorers tend to be original, creative, curious, complex; low scorers tend to 7 10 

Experience/Intellect be conventional, down to earth, characterized by narrow interests, uncreative. 
Conscientiousness High scorers tend to be reliable, well-organized, self-disciplined, careful; low 21 8 

scorers tend to be disorganized, undependable, negligent. 

Extraversion High scorers tend to be sociable, friendly, fun loving, talkative; low scorers tend 79 79 

to be introverted, reserved, inhibited, quiet. 


1 The big five personality test available at http://www.outofservice.com/bigfive/ 
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Agreeableness 

High scorers tend to be good-natured, sympathetic, forgiving, courteous; low 
scorers tend to be critical, rude, harsh, callous. 

22 

10 

Neuroticism 

High scorers tend to be nervous, high-strung, insecure, worrying; low scorers 
tend to be calm, relaxed, secure, hardy. 

32 

27 


5.Z Materials 

During the time of the data collection, the participants produced oral and writ¬ 
ten texts approximately three times a week, which was usually on Friday, Satur¬ 
day, and Sunday. For each participant, 100 oral texts and 100 written texts were 
gathered. The topics, selected from the list of standard TOEFL tests by one of 
the researchers, were of the same genre. All the topics were presented to the 
two participants at the beginning of the study. Examples of the topics for writing 
and speaking are given below. 

Example of a speaking topic: 

"Which of the following statements do you agree with? Some believe that TV pro¬ 
grams have a positive influence on modern society. Others, however, think that the 
influence of TV programs is negative. What TV programs have a positive influence? 
Why? What TV programs have a negative influence? Why?” 

Example of a writing topic: 

"Do you agree or disagree with the following statement? With the help of technology, 
students nowadays can learn more information and learn it more quickly. Use specific 
reasons and examples to support your answer." 

In order to motivate and remind the participants to obtain extra exposure to 
English and to do the speaking and writing tasks, one of the researchers created a 
private group on Facebook for the project, which only the researcher, the partici¬ 
pants, and the parents had access to. The researcher reminded the twins every 
week to record themselves and to write the texts. Recordings were sent through 
email, and the written texts were posted in the Facebook account. To keep the par¬ 
ticipants motivated in the study, the researcher reacted to the content of each text, 
but no corrective feedback on form was given for either the oral or the written texts. 

All texts were prepared for automatic processing in Lu's automatic syntac¬ 
tic complexity analyzer (Lu, 2010). The analyzer is designed to investigate the 
syntactic complexity in writing in second language acquisition, and 14 indices of 
syntactic complexity are calculated (see p. 479). 

For our study, we used length of T-units as the complexity measure. A T-unit is 
defined as "one main clause plus any subordinate clause or non-clausal structure that 
is attached to or embedded in it" (Hunt, 1970, p. 4). A dependent clause is defined as 
a finite adjective, adverbial, or nominal clause, while non-finite verb phrases are ex¬ 
cluded from the definition of clauses (e.g., Bardovi-Harlig & Bofman, 1989). 
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All oral texts (each about 200 words in length) were first transcribed by the 
researcher. To avoid redundancy in the oral production, filled pauses, dysfluencies 
(e.g., repetitions, restarts, and repairs), and utterances that did not involve lin¬ 
guistic meaning or form (e.g., laughter) were excluded. Then both the oral and 
written data were pre-processed for the analyzer, mainly to enable correct calcu¬ 
lations, for instance by correcting punctuation. All other errors were left un¬ 
changed to keep the data as original as possible. After pre-processing, the text 
files were submitted one by one to the automatic processing tool to obtain the 
value of the syntactic measure for observation (mean length of T-unit =M LT), 

For the lexical diversity in this study we used VocD (Malvern, Richards, 
Chipere, & Puran, 2004, p. 47). VocD is an adjusted metric for the type/token 
ratio (TTR), which is standardized for text length. In view of the differences in 
text length in the data, some of which were relatively short, VocD was used as a 
reliable measure of lexical diversity. VocD was measured as described in the fol¬ 
lowing equation, illustrating standardization for text length: 



VocD is the single parameter of a mathematical function that models the falling 
TTR curve. The higher the D, the greater the diversity of a text, independent of 
text length. A computer program called VocD in CLAN (M acWhinney, 2000) pro¬ 
vides a standardized procedure for measuring D (see M alvern et al., 2004). 

5.3. FVocedire 

For this longitudinal study 100 written and 100 spoken language samples were 
collected during a period of eight months. For a different study that used these 
same data (Chan et al., 2014), the effect of input on vocabulary knowledge was 
investigated. For this purpose, the data contained manipulations of the input con¬ 
dition in three stages. A stage of relatively low input was followed by a stage of 
high input, followed by a stage of low input. According to the self-reports in the 
diaries of the participants, they obtained about 2 to 5 hours per week of extra 
input until Data point 20; 5 to 15 hours per week until Data point 56; and again 2- 
5 hours per week until the last data point. Although the manipulation is not rele¬ 
vant for the study reported here, it does illustrate that the two participants were 
exposed to virtually identical input during the period of recording the data. 

5.4 Data analysis 

First, we averaged the scores of each data series (M LT/written, MLT/spoken, 
VocD/written, VocD/spoken) to see if there was a difference between the girls 
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across the entire trajectory. Secondly, we tested for each girl whether there was 
a significant increase (or decrease) in the score over time. Then we looked at 
the degree of variability in the data. We aimed to discover whether there is a 
difference in the global variability (see below) between the two girls across the 
entire trajectory. Finally, we tested whether patterns of variability changed over 
time. More explicitly, we were interested to see whether there was a signifi¬ 
cantly greater degree of variability early on than towards the end or vice versa, 
and whether there was a global trend in the amount of variability across time. 

In order to test the significance of the observed differences between the 
girls and increases or decreases within each time series, M onte Carlo permuta¬ 
tion analyses were performed. This is a statistical testing procedure that esti¬ 
mates probabilities by randomly drawing samples from a dataset based on the 
null hypothesis, and comparing the empirically found values with a random 
resampling procedure. If the probability of finding the observed value in the 
output of the resampling procedure is very low (in this case below 5%), the re¬ 
sult is considered to differ significantly from the null hypothesis model. (For 
more information on the use of permutation tests, seeTodman & Dugard, 2001.) 

In the current data, the M onte Carlo analysis was used to (a) test whether 
there was a difference between Grace and Gloria (for the mean level of 
M LT/spoken, M LT/written, VocD/spoken and VocD/written), (b) to test whether 
there was a significantly increasing or decreasing slope in each individual time 
series of scores, (c) to test whether there is a difference in the amount of varia¬ 
bility between Gloria and Grace for all variables, and (d) to test whether there 
was a significantly increasing or decreasing slope in the variability (range) of 
each time series. All analyses were performed in Excel in combination with Pop- 
tools (Hood, 2004). 

For the first M onte Carlo test, our testing criteria were the differences in 
the mean scores of M LT/spoken, M LT/written, VocD/spoken and VocD/written 
between Gloria and Grace (Gloria's mean minus Grace's mean). We reshuffled 
the data of the two participants across each other (5.000 times) to create 
resampled time series. This simulates results for the null hypothesis that there 
is no difference between the girls. From these simulated time series, we com¬ 
puted the difference between the two participants again and compared these 
to the empirically found differences. 

For the second M onte Carlo test, the procedure was highly similar to the 
first, but in this case we took the global variability of each time series as testing 
criteria. This global variability was determined as the average of a moving range 
across five data points. This means that we took a moving window of five con¬ 
secutive data points and calculated the local range (the maximal value in the 
window minus the minimal value in the window). The average of this moving 
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range was compared to the average of the moving range of simulated time se¬ 
ries, based on the null hypothesis that there are no differences between the girls 
(see above). For both tests, we considered the difference to be significant when 
the probability that the reshuffled data produces the same (or larger) difference 
between Gloria and Grace as the observed difference is less than 5%. 

The third and fourth Monte Carlo tests are based on the trend of each 
individual data series. The testing criteria were the linear slopes of each of these 
series. For the third test, we computed the slope of each data series and com¬ 
pared these to the slope of simulated individual time series. These are based on 
5.000 reshuffles of each data series across time. This simulates the null hypoth¬ 
esis that the data points are independent on time. The fourth test follows the 
same procedure, but here the slope is based on the values of the moving ranges 
(with a moving window of five data points) that were computed the estimate 
the local variability. This slope shows whether this "local" variability is in- or de¬ 
creasing over time. For both tests, we considered the result to be significant 
when the probability that the reshuffled data produced a slope similar or larger 
than the observed slope is less than 5%. 

In order to analyze the relation in the performance between both girls and 
between the individual linguistic variables, we also performed Pearson correlation 
analyses. These are based on the observed values of each time series. Because of 
the number of tests we performed, we used a rather strict alpha of p <.01. 

6. Resits 

6.L M137written 

When visually inspecting the trajectories of M LT/written, it stands out that Grace 
seems to be much more proficient than Gloria (see Figure 3). Both girls start out 
at a reasonably proficient level, at the beginning of the measurement period, and 
only Grace seemsto increase during the measurements. It also shows a large de¬ 
gree of intra-individual variability, with several peaks, especially for Grace. 

The Monte Carlo analysis confirmed the difference between the girls in 
global M LT/written. The average of Gloria is 9.967, the average of Grace is 
12.866, and this difference is significant (p <.001). This means that Grace is gen¬ 
erally more proficient than her sister. The results further showed that both 
slopes are positive and significant (Gloria: slope =0.012, p =.002; Grace: slope 
= 0.032, p = .001), which means that both girls show an increase in M LT/written 
overtime, though Grace's slope is steeper. With regard to intra-individual varia¬ 
bility, the local range of Grace is larger (Gloria has an average of 0.004, Grace 
0.031). This difference tested to be significant (p <.0001), indicating that Grace 
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has more variability overall. We also tested whether this range increases or de¬ 
creases over time (indicating a global change in amount of variability). The results 
show that both slopes are positive (0.012 for Gloria and 0.032 for Grace), but the 
increase was only significant for Grace (p =.122 for Gloria, p =.009 for Grace). 

Combined, the results show that the trajectories of the girls are rather 
dissimilar: Grace is more proficient, has a steeper increase, and has more varia¬ 
bility than Gloria. Her variability is also increasing over time, which is not the 
case for Gloria. 



Rgire3Written M LT for both Gloria (grey) and Grace (black) 

6.2 M07 spoken 

Visual inspection the data of M LT/spoken suggest that the trajectories of the girls 
largely overlap (see Figure 4). Again, we observe relatively high levels of profi¬ 
ciency at the start of the observations and much intra-individual variability from 
measurement to measurement. However, it seems that the variability is more 
concentrated in the first half of the measurement period and decreases overtime. 

The results of the Monte Carlo analyses show a small difference in spoken 
M LT (Gloria is 13.148 and for Grace 14.204), which almost reaches significance 
(p =.011). Furthermore, the slope of Grace was significantly negative (-0.031, p 
= .006), and nonsignificant for Gloria (0.018; p =.436). This means that Gloria's 
performance is relatively stable over time and that there is a slight but signifi¬ 
cant decrease for Grace. With regard to the amount of variability, the analyses 
show that Grace has generally more variability over the entire trajectory (Gloria 
has an average local range of 6.610 and Grace of 7.972, p = .005) and that the 
amount of intra-individual variability decreases over time for both. The slopes 
of the local ranges are negative and significant for both girls (-0.0474 for Gloria 
and -0.079 for Grace; p <.001 in both cases). 
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Combined, this shows that there is no general increase in proficiency in 
spoken syntactical development, but instead that both girls seem to stabilize. 
Grace's performance is somewhat more variable from moment to moment. 

30 
25 
20 
15 
10 
5 

0 4 M 11 H H 1111 11111111 m 1 H 1 M M l H 11II 1111 H 111II11 11111111II111 M I M M 11III M II III III111 H 11111111 > 
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RgLre4Spoken M LT for both Gloria (grey) and Grace (black) 

6.3. UxCy written 

When visually inspecting the trajectories of written VocD, the values for Gloria 
seem to be generally somewhat higher (see Figure 5). Again, no clear increase 
over time can be detected and Grace even seems to decrease over time. The 
amount of variability is also large again. Notably, Grace's variability seems to 
drop after M easurement 57. 

The M onte Carlo analysis confirmed that Gloria has a higher general level 
of proficiency. The average for Gloria was 60.918 and for Grace 53.879, and this 
difference was significant (p c.001). Both slopes are negative (-0.052 for Gloria 
and -0,126 for Grace), but only Grace's is significant (p values are 0.898 and 
0.001 respectively). This means that Grace's proficiency is decreasing over time. 
With regard to variability, no differences were found (the local range for Grace 
was 27.172 and for Gloria 27.595; p =0.615). For both girls, there is a negative 
trend in local variability, indicating a general decrease of variability (Gloria's 
slope is -0.004 and Grace's is -0.132), but only Grace's is significant (p values 
are .548 and <.001 respectively). This means that only Grace is decreasing in 
her variability, indicating that her level is stabilizing. 

Together, the results are somewhat different for each of the girls: Grace 
has a relatively low level and is decreasing over time. In addition, her variability 
is decreasing. Though Gloria generally shows the same patterns, they were 
much less pronounced and did not reach significance. 
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RgjeSWritten VocD for both Gloria (grey) and Grace (black). 


6.4 WxCy spoken 

Finally, forVocD/spoken, Gloria seems to be slightly more proficient than Grace, 
especially at the beginning of the trajectory (see Figure 6). Visual inspection also 
suggests a positive trend for Grace, and it looks like she is "catching up" with her 
sister. With regard to variability, this is clearly present across VocD/spoken as 
well, but it is hard to distinguish a clear trend. 



t—llOr—I VO *—I VO IVDt—IVOt—IIOt—I ID 
t-H»-HrslrslrOfO^-'^-i-n<-OtD<X)r-'l— CO CO Ol Ol 

RgLre6Spoken VocD for both Gloria (grey) and Grace (black) 

The Monte Carlo analyses show that Gloria's proficiency is indeed higher 
than her sister's (the average for Gloria is 42.626 and for Grace 38.580; p =.001). 
Furthermore, only Grace has a significant positive slope (0.090, p = .002), and 
Gloria does not (-0,018, p =.711). With regard to the amount of variability, there 
is no difference between the girls (the average local range for Gloria is 19.158 
and for Grace is 17.578, p =0.067). In addition, the slopes of the variability are 
different for each individual: Gloria's variability is decreasing across time (- 
0.081, p c.001) whereas Grace's is increasing (0.078, p =.003). 

In combination, these results show clear differences between the two 
girls; Grace is the one who is showing signs of development (increase in level 
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and increase of variability), whereas Gloria, who has an initial higher level of 
proficiency, only seems to stabilize over time. 

6.5. Correlations 

When looking at the statistical associations between the data for the two girls, the 
results show that there is only a significant moderate correlation between Gloria 
and Grace for the written VocD (r=.297, p =.003), and one trend towards moderate 
correlation for written M LT. The other correlations are not significant (see Table 3). 


Table 5Correlations between Gloria and Grace for all variables 



M LT/written 

MLT/spoken 

VocD/written 

VocD/spoken 

r 

0,243 

-0.124 

0.297* 

0.110 

P 

0,015 

0.219 

0.003 

0.278 


7. Discussion and conclusion 

Table 4 summarizes the findings of our study. There are significant differences be¬ 
tween the twins in both the degree of development and the degree of variability 
for the two variables in the two modes. The summary in the table could indicate 
whether one of the girls is more proficient than the other across the entire meas¬ 
urement period. Significant increase or decrease in score over time refers to a 
global trend in proficiency across time, that is, the average degree of variability 
across the entire trajectory. Significant increase or decrease in degree of variabil¬ 
ity over time refers to an increase or decrease in the amount of variability across 
time, indicating when the degree of variability would be increasing or decreasing. 

Table6 Summary of the findings of the study 


MLT VocD 



Written 

Spoken 

Written 

Spoken 


Gloria 

Grace 

Gloria 

Grace 

Gloria 

Grace 

Gloria 

Grace 

Higher average scores 


X 



X 


X 


Significant increase or 
decrease in score overtime 

X 

X 


Xneg 


Xneg 


Xpos 

More variability in scores 


X 


X 





Significant increase or 
decrease in degree of 
variability over time 


Xpos 

Xneg 

Xneg 


Xneg 

Xneg 

Xpos 


We observe that the patterns are dissimilar in many cases. Grace is obvi¬ 
ously changing, but not always in the assumed direction and in different directions 
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in written and spoken production. She improves in written M LT but decreases in 
spoken M LT; she decreases in written VocD but improves in spoken VocD. She 
also has significantly more variability than her sister in both M LT scores. Gloria 
changes very little over time but increases in written M LT. She does not change 
in the other variables and seems to have stabilized, which is accompanied by a 
decreasing amount of variability in spoken M LT and spoken VocD. Also the cor¬ 
relation analyses showed that none of the variables strongly correlated with 
each other overtime within each learner. 

Our main research question was whether the degrees of variability in in¬ 
dividuals might correlate with L2 success in individual learners. If we look at Ta¬ 
ble 4 we may conclude that although there is no direct one to one relation be¬ 
tween variability and change, we may tentatively conclude that without a cer¬ 
tain degree of variability there is little L2 change. In our data, if there is an in¬ 
crease or decrease, it is usually accompanied by relatively higher degrees of 
overall variability, and can also be seen in the direction of the slopes of variabil¬ 
ity. Variability does not guarantee success, but it does strongly seem to be a 
prerequisite for change to take place. 

How do these case studies relate to the group studies on IDs? First of all, 
by controlling for as many factors as humanly possible (age of onset, general 
aptitude, general personality types, and so on) by investigating identical twins 
learning the L2 in the same environment and doing the same tasks over time, 
we do see remarkable differences. One of the twins is changing rather erratically 
in all the measures whereas the other is not. Could this have been because 
Grace is slightly more motivated or anxious than her sister? Even if so, it would 
not explain the opposite patterns for spoken and written variables. 

When we linkthese observations to the argument in M olenaar (2015), the 
conclusion we can draw from our study is that M olenaar's mathematically based 
assumptions that observations in the time dimension need to be person specific 
are confirmed in the analysis of behavioral data of second language develop¬ 
ment. In this study we have explored the interacting variables in the develop¬ 
ment of two individuals, which was manifested by the amount of variability and 
its timing. In other words, we have investigated interacting variables at two in¬ 
dividual slices in the time dimension of Catell's cube. In doing this, we made 
sure that the individual learners were maximally similar to optimize the compa¬ 
rability of the data. The conclusion is that in spite of the similarity of the cases 
achieved by minimizing IDs, very clear differences in process characteristics 
were found between the individual cases. This is clearly found in the data and 
confirmed by the correlation analysis of speaking and writing measures be¬ 
tween the twins. Only one significant, though weak, correlation was found here. 
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The study into the effect of IDs on second language acquisition can focus 
on two dimensions. On the one hand we find evidence for the relevance of sev¬ 
eral personal characteristics that have been marked as IDs, such as motivation, 
aptitude, anxiety, personality, etcetera, as these variables have shown to be sig¬ 
nificantly related to achievement in second language acquisition. It has been 
argued that the individual variables that have been related to second language 
acquisition are neither monolithic nor stable, which casts some doubt on their 
value when they are measured at one point in time. On the other hand there is 
the undervalued dimension of the individual's process of development. The pat¬ 
terns of development emerging in individual processes are at least as revealing 
as the global associations coming to light in the analysis of groups of learners. In 
this paper we have argued that these two approaches comprise complementary 
perspectives as they represent different dimensions of Catell's cube (Molenaar, 
2015). The relevance of the distinction between these dimensions was corrobo¬ 
rated in our study of identical twins since even for identical twins that learn the 
language in identical environments, interacting variables of language develop¬ 
ment as it emerges over time are essentially different between these learners. 

M any studies have attempted to crack the code to success in L2 learning 
by identifying IDs that are associated with the prediction of high achievement. 
However, the study of global differences between learners is not the only way 
to identify IDs as these differences can also relate to the process of learning. 
This process is best studied by following individual development over time. Our 
study of identical twins has illustrated that a focus on variability can reveal rele¬ 
vant and interesting differences in the individual learning process. Ideally, when 
advanced statistics allow usto do so, future studies should trace the interactions 
between variables like motivation, aptitude and achievement as they affect the 
process of individual development over time for groups of learners with differ¬ 
ent backgrounds and in different settings. Until that time, we should 
acknowledge that different dimensions of behavior need to be studied, and that 
the study of the process of individual development over time is at least equally 
revealing as group studies concentrating on interacting factors at one point in 
time. As Van Geert (2011) argues, "a theory of development is a theory of 
change, which explains how basic developmental mechanisms can generate 
specific developmental patterns" (van Geert, 2011, p. 276). Such a theory can 
provide predictions and models of developmental trajectories that single case 
studies can fruitfully examine. 
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